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Abstract. In climate studies, detecting spatial patterns that 
largely deviate from the sample mean still remains a sta- 
tistical challenge. Although a Principal Component Anal- 
ysis (PCA), or equivalently a Empirical Orthogonal Func- 
tions (EOF) decomposition, is often applied on this purpose, 
it can only provide meaningful results if the underlying mul- 
tivariate distribution is Gaussian. Indeed, PCA is based on 
optimizing second order moments quantities and the covari- 
ance matrix can only capture the full dependence structure 
for multivariate Gaussian vectors. Whenever the application 
at hand can not satisfy this normality hypothesis (e.g. precip- 
itation data), alternatives and/or improvements to PCA have 
to be developed and studied. 

To go beyond this second order statistics constraint that 
limits the applicability of the PCA, we take advantage of the 
cumulant function that can produce higher order moments 
information. This cumulant function, well-known in the sta- 
tistical literature, allows us to propose a new, simple and fast 
procedure to identify spatial patterns for non-Gaussian data. 
Our algorithm consists in maximizing the cumulant function. 
To illustrate our approach, its implementation for which ex- 
plicit computations are obtained is performed on three fam- 
ily of of multivariate random vectors. In addition, we show 
that our algorithm corresponds to selecting the directions 
along which projected data display the largest spread over 
the marginal probability density tails. 

Keywords. Pattern Analysis, Cumulant Function, Multi- 
variate Gaussian, Skew-normal and Gamma vectors 



1 Introduction 

In geosciences. Principal Component Analysis (PCA) has 
been an essential and powerful tool at detecting spatial struc- 



tures amongst time series recorded at different locations. 
PCA aims at building a decorrelated representation of the 
data and it finds the spati al patterns respon sible for the largest 
proportion of variability iRenchen (Il998h . PCA can also be 
viewed as a dimensionality reduction technique that extracts 
the most relevant components of data, i.e. the ones that max- 
imize the variances. Hence, second order moments are the 
foundation of PCA. But relying exclusively on second mo- 
ments implies that PCA is only optimal when applied to mul- 
tivariate Gaussian vectors. Although rarely stated and even 
more rarely checked, this underlined normality assumption 
is not always satisfied in practice. 

Recently, different approaches have been tested to extend 
the applicability of PCA in geosciences. In particular, Non- 
Linear PCA (NLP CA) has been applied to several geop hys- 



ical datasets (e.g., iHsiehl l2004t iMonahan et al.L 1200 Ih . In 



the NLPCA algorithm, data are considered as the input of an 
auto-associative neural n etwork with fiv e layers, with a bot- 
tleneck in the third layer lKramen (119911) . Through the mini- 
mization of a cost function, the output is forced to be as close 
as possible to the input, and the bottleneck layer is a low di- 
mensional representation of the input. Since this neural net- 
work is nonlinear, NLPCA goes automatically beyond corre- 
lations. However, NLPCA s uffers the intr insic limita tions of 
multilayered networks (e.g., Christiansenl [2005; Ma lthouse , 



19981) : it is computationally expensive and does not always 
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converge to a global solution. To overcome these difficul- 
ties, we pursue a less ambitious aim: instead of trying to find 
decompositions that can explain the entire body of data with 
respect to a criterion, we focus on the part of data responsible 
for large anomalous behaviors. 

In contrast to PCA, our approach tends to give maximal 
weight to data points which largely deviate from the mean, 
and to find the corresponding representative spatial patterns, 
i.e. the directions along which such points are prominently 
distributed. The key element in our procedure is the expan- 
sion of the cumulant function that can provide information 
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beyond t he first two moments. By ma ximizing the cumulant 
function (IKenney and Keepingl Il95lh over growing hyper- 
spheres in the data space, a set of components can be derived. 
We first apply our procedure to three types of multivariate 
random vectors (Normal, Skew-Normal, Gamma). Then we 
demonstrate in the general case that if a direction exists for 
which the marginal probability density of projected data dis- 
play a larger tail than in all other directions, then our proce- 
dure is able to select that direction. 

When finding the first component, we show that PCA is a 
special case of our approach whenever the Gaussian assump- 
tion is satisfied. Besides the Gaussian case we show that, for 
any probabiHty density, the solution derived from the cen- 
tered cumulant function can be transformed to the first prin- 
cipal component by decreasing the radius of the hypersphere. 
Other principal components could be found as well, by a gen- 
eraUzation of the proposed method. Our method is compu- 
tationally cheap, and the solutions are found in the form of 
unit (normalized) vectors, as in the case of PCA, allowing a 
unidimensional projection with an easy geometrical interpre- 
tation. 

In summary, this paper focuses on the problem of char- 
acterizing spatial patterns associated to large anomalies, i.e. 
large deviations from the sample mean, when the data set 
under study cannot be assumed to be normally distributed. 

2 Maximizing the cumulant function 

In the univariate case, the cumulant function of the random 
variable X with finite moments is defined as the following 
scalar function 



log{E[exp(sX)]} 



where s e R, E(.) represents the mean function and the 
scalar k„ corresponds to the n*'* cumulant of X. 

The first two cumulants k,i and K2 are simply the mean 
and the variance of X, respectively. The third and fourth cu- 
mulants are classically called the skewness and the kurtosis 
parameters. Concerning the existence of cumulants, we as- 
sume in this paper that all the cumulant coefficients are finite 
and that the cumulant function is always well defined. The 
cumulant function and its coefficients have many interesting 
properties. For example, if X and Y are two independent 
random variables then the n*^ cumulant of the sum X + Y 
is equal to the sum of the n*'* cumulant of X and the n*'' 
cumulant of Y for any integers n. If X follows a Gaussian 
distribution, then all but the first two cumulants are equal to 
zero. In a multivariate framework, the cumulant function of 
the random vector X ~ {Xi , . . . , Xmf is simply defined as 



log|E[exp(s*X)]|,forall s* = (si,. 



(1) 



hold, but the cumulant coefficients formulas are more cum- 
bersome to write down in a multivariate framework. For 
more information about cu mulants, we refer the reader to 
Kennev and Keeping! ( Il95lh . 

To identify possible favorite projection directions with re- 
spect to the multivariate cumulant function, we first rewrite 
the vector s = (si , . . . , s™)* in (|T|) as the product s ~ |s| x 0, 
where |sp = ^ sf, the scalar |s| represents the norm ("ra- 
dius") of s and 9 is the unit "angular/direction" vector de- 
fined as 9i = Sil\s\ (note that 9^9 = 1). Secondly, the 
cumulant function for our vector X = (Xi, . . . , Xmf pro- 
jected along the direction vector 9 is introduced as 

G|,|(0) =log{E[exp(|s| 0*X)]}, 



n! 



(2) 



As in the univariate case, the linear and Gaussian proper- 
ties associated to the cumulant function defined by (HJ still 



Our algorithmic strategy is to maximize the cumulant func- 
tion at fixed non-small |s| with respect to the angular compo- 
nent 9 that varies over an unit hypersphere. Practically, we 
have to find the optimal 9g directions defined by 

9s = argmax [G|s|(^) such that 9^9 = l] , for any |s|. 

(3) 

If the radius |s| is small enough, and the mean fci is zero, 
the variance k2{9) dominates the centered cumulant func- 
tion and the contributions of other cumulants can be ne- 
glected. In this situation, finding the first PCA component 
can be viewed as a special case of this optimization proce- 
dure, because maximizing the cumulant function for small 
|s| is equivalent to maximize the variance. As the value of 
|s| grows, higher and higher order cumulants become more 
dominant. Our main goal is to find 9s in ([3]) for the largest 
admissible |s| and study their properties. We will call the 
solutions of such an optimization scheme the Maxima of the 
Cumulant Function (MCF) directions. 

We anticipate that, since the scalar product 6'*X is invari- 
ant under orthogonal transformations, the cumulant function 
is invariant as well. Given that the unit hypersphere is also 
invariant, our algorithm is symmetric respect to orthogonal 
transformations. For instance, if data vectors are rotated by a 
given angle, the solutions of the algorithm, in terms of 9, are 
rotated by the same amount. This symmetry implies that if 
the probability density is isotropic, then the cumulant func- 
tion is isotropic as well, and no relative maxima of G exist on 
the unit hypersphere. In that case no directions are selected: 
for a rotationally symmetric distribution there is indeed no 
preferred direction along which anomalies are prominent, 
they are distributed uniformly over all angles. 

To illustrate our optimization procedure we derive explicit 
cumulant maximization schemes for three special cases of 
multivariate family distributions, see Section [3] For as- 
sessing the outputs of our algorithm when applied to real 
dat a, we refer th e reader to the second part of this paper 
iBernacchia et alJ (l2007i) ). 
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3 Theoretical examples 



To study the properties of the maximization method devel- 
oped in Section |2] three examples of distribution functions 
are considered in this paper We choose these three families 
because explicit results can be derived and they have been 
classically used in the statistical modeling of temperatures 
and precipitation data. 

Without loss of generality, some relevant matrices are di- 
agonal thereafter. However, since the solutions are invariant 
respect to orthogonal transformations, they may be rotated 
along with the corresponding coordinate change. Analytical 
calculations are performed for the general multivariate case, 
while figures are given for the bivariate case. 

3.1 Multivariate Gaussian vectors 

Suppose that the data at hand can be appropriately fitted by 
a multivariate Gaussian vector. We assume that the observa- 
tions have been centered (zero) mean and we denote the co- 
variance matrix as S. The cumulant function of th e centered 
Gaussian vector (e.g., Kennev and KeepinS, 1951 ) is equal to 



log{E[exp(s*X)]} 



1 



-s'Es. 



Hence, it is easy to show that all cumulants but the second 
are equal to zero. The decomposition in Equation s = 
|s| X 6, implies that the cumulant function becomes 



2 ^ ' 2 



Our optimization problem is to maximize G|s| {6) under the 
constraint 9*9 = 1. To find the optimal Og defined by (O, 
we introduce a function to be maximized, constrained by the 
Lagrange multiplier A, as 



6»*S0 - A(6/*0 - 1). 
Setting the gradient with respect to 9 to zero gives 



_ 2X6 = 0. 

2 



(4) 



Let As be the largest eigenvalue of the covariance matrix S 
and 0s its associated eigenvector. Introducing A^ 



Ay 



we can write -l^S^s = Ag^s- Consequently, 0e is the 
solution of (IDi. Note that 9^ depends on S but not on |s|. 
The optimal direction, for the Gaussian case, is 9s = 9^, 
and it corresponds to the classical first principal component. 

To illustrate this result, a bivariate vector of the normal 
distribution is presented in Figure[T](in which a contour plot 
is drawn in logarithmic scale). The matrix S is assumed to 
be diagonal, with entries 1.2 and 0.5143. The first princi- 
pal component (PCI), corresponding to the eigenvalue 1.2 




Fig. 1. Isoprobability contours of the multivariate Gaussian distri- 
bution, with zero mean and variances 1.2 and 0.5143 (entries of 
the diagonal covariance matrix). The first principal component is 
shown (PCI), together with the two (opposite) maxima of the cu- 
mulant function (MCFl and MCF2). All vectors are in arbitrary 
scale. The maxima of cumulant function are parallel to the first 
principal component, all pointing towards the large anomalies, in 
terms of high probability (at fixed vector norm). Probability con- 
tours are 10"^10"^10"^ . . . 10"" 



is horizontal, while the second principal component, corre- 
sponding to the eigenvalue 0.5143 is vertical, and is not dis- 
played. The two maxima of the cumulant function (MCFl 
and MCF2) are just the positive and negative part of PCI. 
Indeed, both 9-s and —9^ are solutions of (|4|. While this is 
always true for PCA, the maxima of the cumulant function 
may in general neither be parallel nor orthogonal. 

PCI is indeed the direction along which large anomalies 
are ditributed in the Gaussian case. In order to derive PC2 
and other higher principal components from the cumulant 
function, one would have to determine not only its maxima, 
but also its minima and saddle points. 

3.2 Multivariate Skew-Normal vectors 

To introduce skewness to the Gaussian density, while keep- 
ing some of valuable properties o f the normal distribution . 



Azza l ini and his co-authors (e.g lAzzalini and Dalla Valle , 
J996; Azzalini and Capitaniol 1999t Gonzalez -Farias et al. 



i2004l) have extended the normal density to a larger class 
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called the Skew-Normal (SN) density, that is defined as 

/(x) = 20s(x)$(a*x) (5) 

where ^ is a multivariate Normal probability density func- 
tion with zero mean and covariance matrix S, and $ is the 
cumulative density function of an univariate Gaussian ran- 
dom variable with zero mean and unit variance. The vector 
a corresponds to the degree of skewness. When a. — Q, 
there is no skewness, and the SN distribution reduces to the 
Gaussian cas e. From it i s possible to derive the cumulant 
function (e.g lAzzalini and D alla Valle, 1996) of a SN vector 



log{E[exp(s*X)]} = is*I]s + log(2$(yf/x*s)) 
where jjb represents the mean vector, and is equal to 



^f(l + a*S]a)' 

Note that the co variance matrix of the SN distr ibution is not 
E but E - /x/x* dAzzaUni and Capitani"o[ll999l) . 

A bivariate example of the SN distribution is presented in 
Figure |2] in which a contour plot is drawn in logarithmic 
scale. The matrix E is chosen to be diagonal with entries 
1.2 and 0.5143. The skewness vector a is taken to be equal 
to (4.365,-1.455). The distribution in Figure |2]has been 
centered such that the mean vector is zero. In the bivariate 
example, ^ is (0.8367, -0.1195). 



Fro m the SN cumulant function ( I Azzalini and Capitanio , 
1999I) . we can write the cumulant function of the centered 
vector 0*(X — /x) as 



|s|/x*0- 



-0*E6»+log 



2<I.(yf|s|/x*0) 



(6) 



While it is not possible to find explicit solutions of the max- 
imization problem defined by (O for one can provide 
valuable approximated solutions for both small and large | s | . 
In the former case, the following two Taylor expansions are 
the key elements to derive our results 



log(l 



1 



— + o{s'^) and 
2s 



1 + erf(s) = 1 + — + o{s'') 
where erf corresponds to the error function defined by 

2a>(s) = l + erf(-^). 
Then we can write the following approximation 



(7) 
(8) 



log 



= log 
= loa 



l + erf(^|s|/x*6») 



1 + |s|/x*0 + o(|sp) ,by (Hi 



|s|m*0- 



-6l*/xAt*^ + o(|s|3),by (O. 




Fig. 2. Isoprobability contours of the multivariate Skew-Normal 
distribution, with parameters a = (4.365, —1.455) and the ma- 
trix E is chosen to be diagonal with entries 1.2 and 0.5143. The 
first principal component is shown (PCI), together with the two 
maxima of the cumulant function (MCFl and MCF2). All vectors 
are in arbitrary scale. In this case, the two maxima of the cumu- 
lant function are not related to the first principal component, and 
point towards the (local) large anomalies, in terms of high proba- 
bility (at fixed, and large, vector norm). Probability contours are 



10"\lO"^10" 



,10" 



From Equation (|6]l, it follows that the cumulant function 
Gisi {0) is approximately equal to 



Hf/)0, for small |s| 



(9) 



As previously noticed, the matrix E— /j/x* represents the 
covari ance of the SN distribution lAzzalini and Capitanio 
I999I) . Hence, the maximization of the right hand side of (|9]l 



is equivalent to solving the system defined by (|4|i, but instead 
of working with E, we just need to replace E by E— /x/x* in 
Consequently, the solution to maximize the SN cumulant 
function in the neighborhood of zero is the largest eigenvec- 
tor of the matrix E— /x/x', i.e. the PCI of the SN covariance 
matrix. 

For large |s|, this result does not hold and different direc- 
tions are obtained. We need to recall the asymptotic expan- 
sion of the error function 



l+erf(s) 



^ exp(-s2) 1 + o(s"^) 



, as s J, —CO, 
, as s t +00. 
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If < 0, the logarithm in Equation ^ can be expanded 



as 



log 



2$(yf|s|M*0) 



= log 



l + erf(^|s|/x*0) 

l = |2 



2 2 



-0VM*^ + O(log(|sri)). 



In this case, we have G\^\{e) ~ 1^0* (^S - f At/x*)6/. The 

direction that maximizes G|s| (6) is again a eigenvector, but 
this time is the eigevector of the matrix S — f /x/x*, pointing 
towards fi*6 < 0. 
If fi*9 > 0, we have 



log 2$(yf |s|/x*0) 



iog2 , fi^e > 0, 

,if/x*6/ = 0. 



Then, the cumulant function G|s| {9) can be approximated by 

G|s| (0) ~ -l^^'S^ and maximizes by the largest eigenvec- 
tor of S, pointing towards /x*0 > 0. 

In summary, depending on the size of |s| (small or large) 
and the sign /i*0 (positive or negative), the solutions of 
Equation ^ for the SN distribution can be viewed as the 
largest eigenvectors of three different matrices, /x/x*, S 
and S — f /x/x*. 

For the bivariate example of Figure |2] the PCI is shown 
(in arbitrary scale), explaining 60% of the variance. The sec- 
ond PC (40% of the variance) is orthogonal to PCI and is not 
displayed. Both maxima of the cumulant function for large 
|s|, denoted as MCFl and MCF2 (respectively for /x*0 > 
and /i*0 < ), are presented in Figure |2] for the bivariate 
example (same scale as PCI). The two local maxima point 
towards the large anomalies of the distribution: this can be 
seen by noting that a point at the upper-right end of PC 1 cor- 
responds to a small probability, and hence is less likely to be 
found, than a point at the right end of MCFl (the two points 
being of equal norm). Similarly, a point at the down-left end 
of PCI is less likely, in probability, than a point at the down 
end ofMCF2. 

3.3 Multivariate Gamma vectors 

This section investigates the multivariate Gam ma distribu 



tion d efined by Cheriyan and Ramabhadran (see iKotz et al 



19981) . Each component of the data vector X is distributed 
following a Gamma distribution, and the components depend 
each other by means of an auxiliary variable z. The joint dis- 
tribution is 



giz; ao) Y[ g{xi - z; ai)dz 

i—l 

where g is a gamma distribution, i.e. 

g{z,a) 



(10) 



for z > 0, equal to zero otherwise. A bivariate example is 
presented in Figure [3j with n = 2, ao = 2, ai = 0.5 and 
a2 = 4 in ( fTOb . 




Fig. 3. Isoprobability contours of the multivariate Gamma distri- 
bution, with parameters ao — 2, ai = 0.5, 02 ~ 4. The first 
principal component is shown (PCI), together with the maximum 
of the cumulant function (MCFl). All vectors are in arbitrary scale. 
Again, the maximum of the cumulant function is different from the 
first principal component, and points towards the large anomalies, 
in terms of high probability (at fixed, and large, vector norm). Prob- 
ability contours are 10~^''^10"^10"^''^ . . . 

The cumulant function for this multivariate Gamma distri- 



bution can be written jKotz et al.L 119981) as 

n n 

log{E[cxp(s'X)]} = -aolog(l-^s,)-^aaog(l- 



i=i 



i=l 



and the mean of the z*'' component is /^jj = qq + a;. By 
replacing s by |s| x 9, the cumulant function of the centered 
vector 0* (X — /x) can be written as 



r(a) 



G|s|(0) = -aolog(l-|s|^0, 

n 

- «^l0g(l - - M II («0 + a^)0^. (H) 



i=l 
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For small |s|, the logarithms are approximated by the trun- 
cated Taylor expansions, i.e. 



and 



aolog(l - |s| e,) ~ -|s| ^ aoe, + ^-ao ( ^ 6. 

i—l i—1 i—1 

It follows that the cumulant function can be approximated by 

|2 



(12) 



where the covariance matrix C is defined by iKotz et al 
(Il998h 

■ 

"0 ao + an 

Hence, for small |s|, the solution of (O is the classical PCI 
eigenvector of the covariance matrix C associated with the 
largest eigenvalue. It is plotted in Figure [3] for the bivariate 
example. The negative part of PCI is not displayed, as well 
as the second principal component which is just orthogonal 
to the first. 

For large |s|, the cumulant function is not defined, since 
the logarithms in Equation (fTTl i must have positive argu- 
ments, for all unit vectors 0. In particular, the following two 
inequalities have to be satisfied 



|s|6l, < 1 and |s|^6ii < 1. 



In other words. 



|s| < mm ^mni | — 



However, we take the largest allowed value of |s|, which 
is considered as a valid limit. Since is a unit vector, the 
maximum of each component 9i is 1, which holds when all 
the other components are zero, while the maximum for the 
sum 9i is y^, which holds when 9i =62 = ... = 
9n = \l\pn. The largest allowed value of |s| is then 
Xj^/n: in that case G remains finite for all 0's, except for 
9\ = 9^ = ■ ■ ■ = 9n = \l \fn, where it diverges due to the 
first logarithm of Equation (fTTl i (all the others remain finite). 
For larger values of |s|, G diverges over subspaces larger 
than a single point. Hence the boundary case |s| = ^/y/n 
is taken as representative for a "large" |s| limit, and the point 
9i ~ 92 ~ ■ ■ ■ ~ 9n = is taken as the maximum of 

the cumulant function. 

For the bivariate example, the maximum is plotted in Fig- 
ure [3] denoted as MCFl, and scaled to PCI. Note that for 



high values of the probability distribution (e.g. the smallest 
contour), PCI seems a representative direction of the egg- 
like shape of the distribution, but for low probabilities it be- 
comes clear that the line 9i — 6*2 is responsible for the large 
deviations. A point at the end of PCI is indeed less probable 
than a point at the end of MCFl . 

This result is understood by noting that the joint density 
( [Tol l is the probability distribution of variables Xi defined as 
Xi = 20 + Zi, where the variables zq, zi, . . . , Zn aie indepen- 
dent and Gamma distributed with parameters ao,ai, . . . , an- 
Hence, a large deviation of zq, which occurs independently 
on others z's, corresponds to a large deviation of x which 
is placed on average along the line xi ~ X2 = ■ ■ ■ = Xn 
(that corresponds to 9i = 92 = ■■■ = 9n for the cumulant 
function). 



4 Maximizing the marginal density tail 

We have seen that the multivariate cumulant function reduces 
to the variance if the radius |s| tends to zero. In that case, its 
maxima corresponds to the first principal component of data 
set. If |s| grows, higher order cumulants come into play, but 
is not clear what the corresponding maxima represent. In 
order to clarify this point, we rewrite the cumulant function 
defined by ^ in terms of the explicit integral over the prob- 
ability density 



G|,|(0)=log / /(x)exp(|s|0*x)dx 

This expression can be reduced to an unidimensional inte- 
gral, by defining the projected data as Z = 0*X, and the 
marginal probability density of the projected data, /^(z). 
The cumulant function is then 

/+00 
/0(z)cxp(|s|z)dz, (13) 
-00 

which corresponds to the cumulant function of the univariate 
vector .Z = 0*X with distribution density fgiz). In the light 
of this representation, our maximization procedure is better 
understood: we are looking for directions 6 that correspond 
to a marginal probability density fg{z) displaying maximal 
cumulant function, at fixed |s|. We want to demonstrate 
that if |s| grows, a larger cumulant function corresponds to 
a marginal density with a fatter tail. Hence our procedure 
selects the directions corresponding to the marginal densities 
with fatter tails, where the anomalous behaviour is expected. 

Specifically, consider two different directions 6 and 6': we 
want to demonstrate that if the marginal distribution along 
has a fatter tail than the distribution along 9', then the cumu- 
lant function has also a fatter tail along 6 respect to 6'. More 
formally, we have the following theorem. 

Theorem 1. Let 6 and 6' be two directions. If there exists a 
real z* such that the density distribution Jq^z) of the random 
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variable 6 X is strictly larger than the density fg'{z)for all 
z > z* , i.e. 

feiz) > fo'{z), for all z > z* , 

then there exists a radius |s|* such the cumulant function of 
0*X and 0'*X satisfies 

G|s|W >G|s|(e'), forall\s\ > \s\*. 

Proof. In order to prove the result, we start by noting that the 
following inequality holds 

exp(|s|z)[/0(z) - fg^iz)] > exp(|s|z*)[/0(z) - /^.(z)] 

for all z > z*, where we have replaced the exponential func- 
tion with its minimum value in the interval z e (z*, +00). 
The inequality holds because the density difference fgiz) — 
fg' (z) is positive in this interval, by assumption. Since the 
above inequality holds in the whole interval z e (z*, +00), 
it can be integrated over, i.e. 
+00 



exp(|s|z)[/g,(z) - fg'iz)]dz > 

r+oa 

cxp(|s|z*) / [/g,(z) - /^'(z)]dz. 

J z* 

Given that the two densities are normalized, i.e. 

/+00 /•+00 
fg{z)dz = / fQ'{z)dz = 1 
-00 J —00 

we can rewrite the right hand side (r.h.s.) of the last inequal- 
ity by inverting the integration order and the marginal densi- 
ties as 

exp(|s|z*) / [fgiz)-fg'iz)]dz = 

exp(|s|z*) /" Ifg'iz) - f0iz)]dz 

Rearranging the four terms gives indeed the normalization 
condition times the exponential. Note that since the integral 
in the left hand side (l.h.s.) is, by assumption, positive, the 
integral in the rh.s. must be positive as well, because they 
are multiplied by the same positive constant exp(|s|z*). 

Given the above inequalities, we only need to demonstrate 
that it exists an Isl* such that, for all Isl > Isl*, 



exp(|s|z*) [ [fQ'{z)- fQ{z)\dz > 

J —00 



exp(|s|z) [fg'iz) - fg{z)]dz 



(14) 



Equation (fT4l i would hold trivially for all values of |s|, be- 
cause the exponential in the l.h.s. is always larger than 
that of r.h.s. This corresponds to the case in which, beside 
fgiz) > fg'iz) Vz > z*, we assume also fg{z) < 
fg'{z) Vz < z*. However, we do not need this additional 
request, and we just note that it would help our procedure by 
leaving the problem of using a large | s | . 

Since the exponential is positive, we can rewrite Equation 
( fT4] l as 

r [f0'iz)^fei^)]dz> 

J —00 

r exp{\s\{z-z*))[fgiz)~fg{z)\dz 

J —OQ 

for all |s| > |s|*. The integral in the l.h.s. is positive, as 
stated above, and is independent on |s|, while the integral 
in the r.h.s. converges to zero for |s| — > +00, as long as 
the density difference remains finite, because the exponential 
tends to zero in the whole interval z G (— cx),z*). If we 
define |s|* as the largest possible value of |s| for which the 
two integrals are equal, then for all |s| > |s|* the integral in 
the l.h.s. is larger than that of rh.s. 
Finally, we have for all | s | > | s | * 



+00 



where the value of |s|* must be determined. Note that even 
if the integral in the l.h.s. is positive, the density differ- 
ence fg'{z) — fg{z) is not guaranteed to be positive for 
all z £ (— (X),z*). If the integrand was positive as well. 



dz exp{\s\z)[fg{z) - fg>{z)] > 
dz exp(|s|z)[/0'(z) - fg{z)\. 



By rearranging terms, this is equivalent to G\s\{(^) > 
G|s| {0'), and the theorem is demonstrated. 

□ 



Discussion 

In this paper, we have introduced a novel method selecting 
the spatial patterns representative for the large deviations in 
the dataset. The method consists in finding the vectors in 
the space of data for which the cumulant function is maxi- 
mal. As in the case of PCA, the spatial patterns are found 
as normalized directions in the space of data, and a linear 
projection can be performed, with an easy geometrical in- 
terpretation. If one is interested on the large deviations, the 
project i on all ows to safely perform Extreme Value Analysis 
dColesI (1200 ll) ). In both cases the subspaces are ordered: in 
PCA the order follows the fraction of variance of each sub- 
space; The maxima of the cumulant function are ordered by 
the value of G. However, while PCA accounts for the mass 
of the distribution, the cumulant function can cover the full 
range of the distribution. 

Principal components are always symmetric, while large 
anomalous patterns, if generated by nonlinear processes, are 
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expected to be neither specular nor orthogonal. Accord- 
ingly, the maxima of the cumulant function are not necessar- 
ily symmetric, since they account for the whole structure of 
dependencies, and not only covariances. Other nonsymmet- 
ric techniques, such as obUque Varimax rotations, generally 
depend on the frame of reference: vector solutions do not co- 
vary with the space of data under orthogonal transformations. 
Hence, solutions depend not only on the shape of the under- 
lying probability distribution, but also on its orientation: an 
undesirable property for our purposes. The maxima of the 
cumulant function, instead, vary with the probability density 
whose shape is the only feature determining the maxima. 

In the case of normally distributed data, the maximiza- 
tion of cumulant function yields the first principal compo- 
nent for all values of \s\: the elliptically symmetric distri- 
bution is characterized by the two tails along the major axis 
of the ellipse, i.e. the first principal component. When the 
method is applied to Skew-Normal and Gamma distributions, 
for non-small |s|, the maxima of the cumulant function de- 
termine large anomalies: high probability directions far from 
the center of mass. Note that the limit radius \s\ oo is 
the innovative key from a technical point of view, allowing 
for analytical solutions. Using the limit, we were also able to 
demonstrate that the solutions of our algorithm correspond, 
in general, to the directions along which the marginal proba- 
bility density display the fattest tails. 

Using the cumulant function is computationally cheap, 
there is no free parameter, and has the advantage of search- 
ing for local solutions, all of which are of interest. When a 
solution is found, is always a good solution, in contrast with 
neural networks applications, for which it must be questioned 
if it is the global or just a lo cal solution. In real applications 



(see iBernacchia et al.L l2007h . the radius \s\ must be taken as 



large as possible, until the expected error in the estimate of 
the cumulant f unction, due to the finite sample, reach a toler- 
ance value (see'Berna cchia et"aL , 2007 ). This corresponds to 
maximize a combination of cumulants which is of the high- 
est rehable order with the given amount of data, accounting 
for the available set of anomalies. 

The solutions of our algorithm are expected to transform 
continuously as the radius \s\ varies. Hence, even if the limit 
|s| ^ oo cannot be taken in practice, the solutions for a fi- 
nite value of |s| are expected to represent a substantial depar- 
ture from the PCA solution, towards the formal solution at 
\s\ oo. From the theoretical point of view, future work 
could be devoted to studying more in detail the nature of 
solutions at varying \s\. For instance, one could attempt to 
find under which conditions and to what extent the solutions 
are in between the PCA solution and the formal solution at 
infinite \s\. From t he applicative point of v iew, we expect 



several datasets (e.g. IBernacchia et aliuOOTh to be fruitfully 



analyzed with our new method. 

Note that the logarithm is taken for illustrative purposes: 
the moment generating function could be used instead of the 
cumulant function, since the maximization is invariant under 



application of a monotonous function. The present definition 
is however confortable in avoiding extremely large numbers. 
Centering of data about the mean is also a practical step, re- 
lated with the constraint of dealing with finite samples: if 
the limit |s| ^ oo could be really taken, the mean would be 
irrelevant. 

Results of our procedure are corrupted if variables are 
standardized by a rescaling, since the relative scale of dif- 
ferent directions is the key in detecting anomalies and com- 
paring the size of tails. If variables are standardized, our pro- 
cedure reduce to a special case of Independe nt Component 
Analyisis (ICA, seeiHyvari nen and Oj al l 2007 )). detecting in- 
dependent components rather than large anomalies. Results 
are also corrupted if we try to get an empirical estimate of the 
cumulant function when the underlying probability density 
decays less than exponentially fast. In that case, the cumu- 
lant function diverge, and the variance of t he empir i cal est i- 
mate increases with the size of the sample (iSornette 
implying that the estimate is always unreliable. 
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